Large Language Model

# Large Language Model

WeClone

WeClone is a project based on fine-tuning a large language model using WeChat chat logs, primarily used for high-quality voice cloning and digital avatars. It combines WeChat voice messages and a 0.5B large model, allowing users to interact with their digital avatars through a chatbot. This technology has significant application value in the fields of digital immortality and voice cloning, allowing users to continue communicating with others even when they are absent. This project is undergoing rapid iteration and is suitable for users interested in AI and language models. It is currently in the free development stage.

Speech and Language Processing

Dream 7B

Dream 7B is the latest diffusion large language model jointly launched by the NLP group of the University of Hong Kong and Huawei Noah's Ark Lab. It demonstrates excellent performance in text generation, especially in complex reasoning, long-term planning, and contextual coherence. The model adopts advanced training methods, possesses strong planning capabilities and flexible reasoning capabilities, and provides stronger support for various AI applications.

Argo

Xark-Argo is a desktop client product designed to help users easily build and use their own large language models. It supports multiple operating systems, including MacOS and Windows, and provides powerful local model deployment capabilities. By integrating ollama technology, users can download open-source models with one click and support large model APIs such as ChatGPT, Claude, and Siliconflow, greatly reducing the barrier to entry. This product is suitable for individual and enterprise users who need to efficiently process text and manage knowledge, and it has high flexibility and scalability. Currently, there is no clear pricing information, but its functional positioning indicates that it may be aimed at the mid-to-high-end user group.

Development & Tools

NotaGen

NotaGen is an innovative symbolic music generation model that enhances music generation quality through three stages: pre-training, fine-tuning, and reinforcement learning. Utilizing large language model technology, it can generate high-quality classical music scores, bringing new possibilities to music creation. The model's main advantages include efficient generation, diverse styles, and high-quality output. It is applicable in music creation, education, and research, with broad application prospects.

Music Generation

AoT

Atom of Thoughts (AoT) is a novel reasoning framework that transforms the reasoning process into a Markov process by representing solutions as a combination of atomic problems. This framework significantly improves the performance of large language models on reasoning tasks through decomposition and contraction mechanisms, while reducing wasted computing resources. AoT can be used as an independent reasoning method or as a plugin for existing test-time augmentation methods, flexibly combining the advantages of different methods. This framework is open-source and implemented in Python, making it suitable for researchers and developers to experiment with and apply in the fields of natural language processing and large language models.

Model Training and Deployment

Spark-TTS

Spark-TTS is a highly efficient text-to-speech synthesis model based on large language models, featuring single-stream decoupled speech tokens. Leveraging the power of large language models, it directly reconstructs audio predicted from code, omitting the additional acoustic feature generation model, thus improving efficiency and reducing complexity. This model supports zero-shot text-to-speech synthesis, enabling cross-lingual and code-switching scenarios, making it ideal for speech synthesis applications requiring high naturalness and accuracy. It also supports virtual voice creation; users can generate different voices by adjusting parameters such as gender, pitch, and speaking rate. The model aims to address the inefficiencies and complexities of traditional speech synthesis systems, providing a highly efficient, flexible, and powerful solution for research and production. Currently, the model is primarily intended for academic research and legitimate applications such as personalized speech synthesis, assistive technologies, and language research.

Level-Navi Agent-Search

Level Navi Agent Search

Level-Navi Agent is an open-source general-purpose web search agent framework that can decompose complex problems and progressively search for information on the internet until it answers user questions. By providing the Web24 dataset, covering five major fields: finance, games, sports, movies, and events, it provides a benchmark for evaluating model performance on search tasks. The framework supports zero-shot and few-shot learning, providing an important reference for the application of large language models in the field of Chinese web search agents.

M2RAG

M2RAG is a benchmark codebase for retrieval-augmented generation in multimodal contexts. It answers questions by retrieving multimodal documents, evaluating the ability of multimodal large language models (MLLMs) to leverage knowledge from multimodal contexts. The model is evaluated on tasks such as image captioning, multimodal question answering, fact verification, and image re-ranking, aiming to improve the effectiveness of models in multimodal contextual learning. M2RAG provides researchers with a standardized testing platform to help advance the development of multimodal language models.

SWE-RL

SWE-RL is a reinforcement learning-based large language model reasoning technique proposed by Facebook Research, aiming to leverage open-source software evolution data to improve model performance in software engineering tasks. This technology optimizes the model's reasoning capabilities through a rule-driven reward mechanism, enabling it to better understand and generate high-quality code. The main advantages of SWE-RL lie in its innovative reinforcement learning approach and effective utilization of open-source data, opening up new possibilities in the field of software engineering. The technology is currently in the research phase and does not yet have a defined commercial pricing, but it shows significant potential in improving development efficiency and code quality.

Coding Assistant

TableGPT2-7B

TableGPT2-7B is a large-scale decoder model developed by Zhejiang University, specifically designed for data-intensive tasks, particularly the interpretation and analysis of tabular data. Based on the Qwen2.5 architecture, it is optimized through Continuous Pretraining (CPT) and Supervised Fine-tuning (SFT) to handle complex table queries and business intelligence (BI) applications. It supports Chinese queries and is suitable for enterprises and research institutions that need to efficiently process structured data. The model is currently open-source and free; more professional versions may be released in the future.

Coding-Tutor

Coding-Tutor is a programming tutoring tool based on a large language model (LLM), designed to help learners improve their programming skills through conversational interaction. It addresses key challenges in programming tutoring through a Trace-and-Verify (Traver) workflow that combines knowledge tracing and round-by-round verification. This tool is not only suitable for programming education but also extensible to other task tutoring scenarios, helping to adjust teaching content according to the learner's knowledge level. The project is open source and supports community contributions.

Tbox - AI Powered Intelligent Agent Builder

Tbox AI Powered Intelligent Agent Builder

Tbox is a large language model-based product designed for Alipay's lifestyle ecosystem, aimed at enabling businesses to quickly build professional-grade intelligent agents and drive business growth. Integrating advanced technologies such as Ant Bailing LLM, Ant Tianjian, and Lingjing Digital Human, Tbox enables enhanced experiences and intelligent decision-making. Applicable to various industries such as public services, government affairs, travel, tourism, and healthcare, Tbox enhances user experience and operational efficiency through intelligent services. Pricing and specific positioning vary according to enterprise needs, providing customized solutions for businesses.

MoBA

MoBA (Mixture of Block Attention) is an innovative attention mechanism specifically designed for large language models dealing with long text contexts. It achieves efficient long sequence processing by dividing the context into blocks and allowing each query token to learn to focus on the most relevant blocks. MoBA's main advantage is its ability to seamlessly switch between full attention and sparse attention, ensuring performance while improving computational efficiency. This technology is suitable for tasks that require processing long texts, such as document analysis and code generation, and can significantly reduce computational costs while maintaining high model performance. The open-source implementation of MoBA provides researchers and developers with a powerful tool, driving the application of large language models in long text processing.

Model Training and Deployment

Goedel-Prover

Goedel-Prover is an open-source large language model specializing in automated theorem proving. It significantly enhances the efficiency of automated mathematical problem solving by translating natural language mathematical questions into formal languages (such as Lean 4) and generating formal proofs. The model achieved a success rate of 57.6% on the miniF2F benchmark, surpassing other open-source models. Its key advantages include high performance, open-source extensibility, and a deep understanding of mathematical problems. Goedel-Prover aims to advance automated theorem proving technologies and provide powerful tool support for mathematical research and education.

Research Instruments

OmniParser-v2.0

Omniparser V2.0

OmniParser, developed by Microsoft, is an advanced image parsing technology designed to transform irregular screenshots into structured lists of elements, including the location of interactive areas and functional descriptions of icons. It achieves efficient parsing of UI interfaces through deep learning models like YOLOv8 and Florence-2. Its main advantages lie in its efficiency, accuracy, and broad applicability. OmniParser significantly enhances the performance of user interface agents based on large language models (LLMs), enabling them to better understand and interact with various user interfaces. It performs exceptionally well in various application scenarios, such as automated testing and intelligent assistant development. OmniParser's open-source nature and flexible licensing make it a powerful tool for developers and researchers alike.

AI design tools

Mistral-Small-24B-Instruct-2501

Mistral Small 24B Instruct 2501

Mistral Small 24B is a large language model developed by the Mistral AI team, featuring 24 billion parameters and supporting multilingual conversation and instruction handling. Through instruction tuning, it generates high-quality text content applicable in various scenarios like chat, writing, and programming assistance. Its key advantages include powerful language generation capabilities, multilingual support, and efficient inference. This model caters to individuals and businesses requiring high-performance language processing, offers an open-source license, supports local deployment and quantization optimizations, making it suitable for scenarios with data privacy requirements.

MNN Large Model Android App

MNN Large Model Android App

The MNN Large Model Android App, developed by Alibaba, is an Android application based on large language models (LLMs). It supports various input and output modalities, including text generation, image recognition, and audio transcription. The app is optimized for inference performance to ensure efficient operation on mobile devices while safeguarding user data privacy, with all processing done locally. It supports a variety of leading model providers, such as Qwen, Gemma, and Llama, making it suitable for various scenarios.

Doubao-1.5-pro

Developed by the Doubao team, Doubao-1.5-pro is a high-performance sparse MoE (Mixture of Experts) large language model. This model achieves an excellent balance between model performance and inference performance through an integrated training-inference design. It excels in various public evaluation benchmarks, showcasing significant advantages in inference efficiency and multi-modal capabilities. The model is suitable for scenarios that require efficient inference and multi-modal interaction, such as natural language processing, image recognition, and speech interaction. Its technical foundation is based on the sparse activation MoE architecture, which optimizes activation parameter ratios and training algorithms to achieve higher performance leverage than traditional dense models. Additionally, it supports dynamic parameter adjustment to cater to diverse application scenarios and cost requirements.

DeepSeek-R1-Distill-Llama-70B

Deepseek R1 Distill Llama 70B

DeepSeek-R1-Distill-Llama-70B is a large language model developed by the DeepSeek team, based on the Llama-70B architecture and optimized through reinforcement learning. It excels in reasoning, dialogue, and multilingual tasks, supporting diverse applications such as code generation, mathematical reasoning, and natural language processing. Its primary advantages include efficient reasoning capabilities and problem-solving skills for complex tasks, while also supporting both open-source and commercial use. This model is suitable for enterprises and research institutions that require high-performance language generation and reasoning abilities.

InternVL2_5-78B-MPO

Internvl2 5 78B MPO

InternVL2.5-MPO is a series of multimodal large language models based on InternVL2.5 and Mixed Preference Optimization (MPO). It excels in multimodal tasks by integrating the recently incrementally pre-trained InternViT with various pre-trained large language models (LLMs) such as InternLM 2.5 and Qwen 2.5, utilizing a randomly initialized MLP projector. This model series has been trained on the multimodal reasoning preference dataset MMPR, which contains approximately 3 million samples, enhancing the model's reasoning capabilities and answer quality through an effective data construction process and mixed preference optimization techniques.

InternLM3-8B-Instruct

Internlm3 8B Instruct

Developed by the InternLM team, the InternLM3-8B-Instruct is a large language model featuring exceptional reasoning capabilities and proficiency in knowledge-intensive tasks. Despite being trained with only 40 trillion high-quality tokens, it achieves over 75% lower training costs than similar models, while outperforming models such as Llama3.1-8B and Qwen2.5-7B on multiple benchmark tests. It supports deep reasoning modes that tackle complex inference tasks, while also offering smooth user interaction capabilities. The model is open-sourced under the Apache-2.0 license, making it suitable for various applications needing efficient reasoning and knowledge processing.

Dria-Agent-a-3B

Dria Agent A 3B

Dria-Agent-a-3B is a large language model based on the Qwen2.5-Coder series, specifically tailored for agent applications. It employs a Pythonic approach to function calling, featuring advantages such as concurrent multi-function calls, free-form reasoning and actions, and instant complex solution generation. The model has excelled in various benchmarks, including the Berkeley Function Calling Leaderboard (BFCL), MMLU-Pro, and the Dria-Pythonic-Agent-Benchmark (DPAB). It has 3.09 billion parameters and supports the BF16 tensor type.

Development and Tools

Dria-Agent-a-7B

Dria Agent A 7B

Dria-Agent-a-7B is a large language model trained on the Qwen2.5-Coder series, specializing in agent applications. It utilizes a Pythonic function calling approach, offering advantages such as simultaneous multipurpose function calls, free-form reasoning and actions, and instant complex solution generation compared to traditional JSON function calls. The model has demonstrated excellent performance across various benchmarks, including the Berkeley Function Calling Leaderboard (BFCL), MMLU-Pro, and the Dria-Pythonic-Agent-Benchmark (DPAB). With 7.62 billion parameters and employing BF16 tensor type, it supports text generation tasks. Its key benefits include powerful programming assistance, efficient function calling methods, and high accuracy in specific domains. The model is suitable for applications requiring complex logic processing and multi-step task execution, such as automated programming and intelligent agents. Currently, it is available for free use on the Hugging Face platform.

Coding Assistant

Dria-Agent-α

Dria-Agent-α is a large language model (LLM) tool interaction framework introduced by Hugging Face. By using Python code to invoke tools, it fully utilizes the reasoning capabilities of LLMs, enabling the model to solve complex problems in a manner closer to human natural language compared to traditional JSON formats. This framework enhances LLM performance in agent scenarios by leveraging Python's popularity and pseudo-code-like syntax. The development of Dria-Agent-α utilized a synthetic data generation tool called Dria, which produces realistic scenarios through a multi-stage pipeline to train the model for complex problem-solving. Currently, two models, Dria-Agent-α-3B and Dria-Agent-α-7B, are available on Hugging Face.

Development & Tools

Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF

Llama 3 Patronus Lynx 8B Instruct Q4 K M GGUF

This model is a quantized large language model that utilizes 4-bit quantization technology to reduce storage and computational requirements. With 8.03 billion parameters, it is free for non-commercial use and ideal for high-performance language applications in resource-constrained environments.

InternVL2_5-38B-MPO

Internvl2 5 38B MPO

InternVL2.5-MPO is an advanced series of large multimodal language models built on InternVL2.5 and Mixed Preference Optimization (MPO). This series excels in multimodal tasks, capable of processing image, text, and video data while generating high-quality text responses. The model employs a 'ViT-MLP-LLM' paradigm, optimizing visual processing capabilities through pixel unshuffle operations and dynamic resolution strategies. Furthermore, it supports multiple images and video data, further expanding its application scenarios. In multimodal capability assessments, InternVL2.5-MPO surpasses numerous benchmark models, affirming its leadership in the multimodal field.

Agent Laboratory

Agent Laboratory

Agent Laboratory is a project developed by Samuel Schmidgall and others, aiming to support researchers throughout the entire research process—from literature review to experimental execution to report writing—through dedicated agents driven by large language models. It is not intended to replace human creativity, but rather to complement it, allowing researchers to concentrate on conceptualization and critical thinking while automating repetitive and time-consuming tasks such as coding and documentation. The tool's source code is licensed under the MIT License, permitting the use, modification, and distribution of the code provided that the terms of the MIT License are followed.

Research Instruments

InternVL2_5-26B-MPO-AWQ

Internvl2 5 26B MPO AWQ

InternVL2_5-26B-MPO-AWQ is a multimodal large language model developed by OpenGVLab, designed to enhance model reasoning capabilities through hybrid preference optimization. This model excels in multimodal tasks, effectively managing the complex relationships between images and text. It utilizes cutting-edge model architecture and optimization techniques, providing significant advantages in handling multimodal data. The model is ideal for scenarios requiring efficient processing and understanding of multimodal data, such as image description generation and multimodal question answering. Its main advantages include powerful reasoning capabilities and an efficient model architecture.

AnyParser Pro

AnyParser Pro is an innovative document parsing tool developed by CambioML. Utilizing large language model (LLM) technology, it quickly and accurately extracts complete textual content from PDF, PPT, and image files. The main advantages of this technology lie in its efficient processing speed and high precision in parsing, significantly enhancing document processing efficiency. Background information indicates that it was launched by CambioML, a startup incubated by Y Combinator, aimed at providing users with a simple, user-friendly, and powerful document parsing solution. Currently, the product offers a free trial, and users can access its features by obtaining an API key.

VITA-1.5

VITA-1.5 is an open-source multimodal large language model designed to enable near real-time visual and speech interaction. It significantly reduces interaction latency and enhances multimodal performance, providing users with a smoother interaction experience. The model supports both English and Chinese and is applicable to various scenarios, including image recognition, speech recognition, and natural language processing. Its key advantages include efficient speech processing capabilities and robust multimodal understanding.

Featured AI Tools

Jules AI

Jules は、自動で煩雑なコーディングタスクを処理し、あなたに核心的なコーディングに時間をかけることを可能にする異步コーディングエージェントです。その主な強みは GitHub との統合で、Pull Request(PR) を自動化し、テストを実行し、クラウド仮想マシン上でコードを検証することで、開発効率を大幅に向上させています。Jules はさまざまな開発者に適しており、特に忙しいチームには効果的にプロジェクトとコードの品質を管理する支援を行います。

開発プログラミング

NoCode

NoCode はプログラミング経験を必要としないプラットフォームで、ユーザーが自然言語でアイデアを表現し、迅速にアプリケーションを生成することが可能です。これにより、開発の障壁を下げ、より多くの人が自身のアイデアを実現できるようになります。このプラットフォームはリアルタイムプレビュー機能とワンクリックデプロイ機能を提供しており、技術的な知識がないユーザーにも非常に使いやすい設計となっています。

開発プラットフォーム

ListenHub

ListenHub は軽量級の AI ポッドキャストジェネレーターであり、中国語と英語に対応しています。最先端の AI 技術を使用し、ユーザーが興味を持つポッドキャストコンテンツを迅速に生成できます。その主な利点には、自然な会話と超高品質な音声効果が含まれており、いつでもどこでも高品質な聴覚体験を楽しむことができます。ListenHub はコンテンツ生成速度を改善するだけでなく、モバイルデバイスにも対応しており、さまざまな場面で使いやすいです。情報取得の高効率なツールとして位置づけられており、幅広いリスナーのニーズに応えています。

腾讯混元画像 2.0

腾讯混元画像 2.0

腾讯混元画像 2.0 は腾讯が最新に発表したAI画像生成モデルで、生成スピードと画質が大幅に向上しました。超高圧縮倍率のエンコード?デコーダーと新しい拡散アーキテクチャを採用しており、画像生成速度はミリ秒級まで到達し、従来の時間のかかる生成を回避することが可能です。また、強化学習アルゴリズムと人間の美的知識の統合により、画像のリアリズムと詳細表現力を向上させ、デザイナー、クリエーターなどの専門ユーザーに適しています。

OpenMemory MCP

OpenMemoryはオープンソースの個人向けメモリレイヤーで、大規模言語モデル（LLM）に私密でポータブルなメモリ管理を提供します。ユーザーはデータに対する完全な制御権を持ち、AIアプリケーションを作成する際も安全性を保つことができます。このプロジェクトはDocker、Python、Node.jsをサポートしており、開発者が個別化されたAI体験を行うのに適しています。また、個人情報を漏らすことなくAIを利用したいユーザーにお勧めします。

オープンソース

FastVLM

FastVLM は、視覚言語モデル向けに設計された効果的な視覚符号化モデルです。イノベーティブな FastViTHD ミックスドビジュアル符号化エンジンを使用することで、高解像度画像の符号化時間と出力されるトークンの数を削減し、モデルのスループットと精度を向上させました。FastVLM の主な位置付けは、開発者が強力な視覚言語処理機能を得られるように支援し、特に迅速なレスポンスが必要なモバイルデバイス上で優れたパフォーマンスを発揮します。

ピカは、ユーザーが自身の創造的なアイデアをアップロードすると、AIがそれに基づいた動画を自動生成する動画制作プラットフォームです。主な機能は、多様なアイデアからの動画生成、プロフェッショナルな動画効果、シンプルで使いやすい操作性です。無料トライアル方式を採用しており、クリエイターや動画愛好家をターゲットとしています。

LiblibAI

LiblibAIは、中国をリードするAI創作プラットフォームです。強力なAI創作能力を提供し、クリエイターの創造性を支援します。プラットフォームは膨大な数の無料AI創作モデルを提供しており、ユーザーは検索してモデルを使用し、画像、テキスト、音声などの創作を行うことができます。また、ユーザーによる独自のAIモデルのトレーニングもサポートしています。幅広いクリエイターユーザーを対象としたプラットフォームとして、創作の機会を平等に提供し、クリエイティブ産業に貢献することで、誰もが創作の喜びを享受できるようにすることを目指しています。

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase